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ABSTRACT 

A Bayesian analysis of 47 Ursae Majoris (47 UMa) radial velocity data confirms and 
refines the properties of two previously reported planets with periods of 1079 and 
2325 days. The analysis also provides orbital constraints on an additional long period 
planet with a period ~ 10000 days. The three planet model is found to be 10 5 times 
more probable than the next most probable model which is a two planet model. The 
nonlinear model fitting is accomplished with a new hybrid Markov chain Monte Carlo 
(HMCMC) algorithm which incorporates parallel tempering, simulated annealing and 
genetic crossover operations. Each of these features facilitate the detection of a global 
minimum in x 2 . By combining all three, the HMCMC greatly increases the probability 
of realizing this goal. When applied to the Kepler problem it acts as a powerful multi- 
planet Kepler periodogram. 

The measured periods are 1078 ± 2, 2391±g?°, and 14002l^^d, and the corre- 
sponding eccentricities are 0.032 ± 0.014, 0.098±;^, and 0.16t^. The results favor 
low eccentricity orbits for all three. Assuming the three signals (each one consistent 
with a Keplerian orbit) are caused by planets, the corresponding limits on planetary 
mass (Msini) and semi-major axis are 

(2.53±;^Afj, 2.10±0.02au), (0.54±0.07Mj, 3.6±0.1au), and (1.6±% 3 5 Mj, 11.6t|£au), 
respectively. Based on a three planet model, the remaining unaccounted for noise (stel- 
lar jitter) is 5.7m s _1 . 

The velocities of model fit residuals were randomized in multiple trials and pro- 
cessed using a one planet version of the HMCMC Kepler periodogram. In this situation 
periodogram peaks are purely the result of the effective noise. The orbits corresponding 
to these noise induced periodogram peaks exhibit a well defined strong statistical bias 
towards high eccentricity. We have characterized this eccentricity bias and designed a 
correction filter that can be used as an alternate prior for eccentricity, to enhance the 
detection of planetary orbits of low or moderate eccentricity. 

Key words: stars: planetary systems; stars: individual: 47 Ursae Majoris; methods: 
statistical; methods: numerical; techniques: radial velocities. 



1 INTRODUCTION 

Improvements in precision radial velocity (RV) measure- 
ments and continued monitoring are permitting the detec- 
tion of lower amplitude planetary signatures. One example 
of the fruits of this work is the detection of a super earth 
in th e habitable zone surrounding Gliese 581 (lUdrv et al.l 
I2007I ). This and other remarkable successes on the part of 
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the observers is motivating a significant effort to improve 
the statistical tools for analyzing radial velocity data (e.g., 
lLoredo fc ChernofftOOl lLoredol2004l, ICummindl2004l. Gre- 
gory 2005a fc b, Ford 2005 fc 2006. iFord fc Gregor vl [20061 . 
ICumming fc Dragomirll2010h . Much of the recent work has 
highlighted a Bayesian MCMC approach as a way to better 
understand parameter uncertainties and degeneracies and to 
compute model probabilities. 

Gregory (2005a, b, c and 2007a, b, c) presented a 
Bayesian MCMC algorithm that makes use of parallel tem- 
pering (PT) to efficiently explore a large model parameter 
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space starting from a random location. It is able to iden- 
tify any significant periodic signal component in the data 
that satisfies Kepler's laws and thus functions as a Kepler 
periodogram 0. This eliminates the need for a separate pe- 
riodogram search for trial orbital periods which typically 
assume a sinusoidal model for the signal that is only correct 
for a circular orbit. In addition, the Bayesian MCMC algo- 
rithm provides full marginal parameters distributions for all 
the orbital elements that can be determined from radial ve- 
locity data. The algorithm includes an innovative two stage 
adaptive control system that automates the selection of ef- 
ficient Gaussian parameter proposal dis tribution s. 

The latest version of the algorithm, lGregorvl (2009). in- 
corporates a genetic crossover operation into the MCMC al- 
gorithm. The new adaptive hybrid MCMC algorithm (HM- 
CMC) incorporates parallel tempering, simulated annealing 
and genetic crossover operations. Each of these techniques 
was designed to facilitate the detection of a global minimum 
in x 2 ■ Combining all three in an adaptive hybrid MCMC 
grea tly increase the probab ility of realizing this goal. 

iButler fc Marcvl (|l996l ~) first reported a 1090 day com- 
panion to 47 UMa using data from Lick Obse rvatory. With 
additi onal velocity measurements over 13 yr, iFischer et al.l 
(2002) announced a long-period second planet, 47 UMa c, 
with a per i od of 2594 ± 90 days and a mass of 0.76Mj. 
iNaef et all (|2004T ) reported observations from the fiber 
fed echelle spectrograph ELODIE of 47 UMa, and noted 
that the second planet was not e vident in their data. 
IWittenmver. Endl fc Cochran! i|2007l ) reported that there is 
still substantial ambiguity as to the orbital parameters of 
the proposed planetary companion 47 UMa c. They gave a 
period of 7586 day for one orbital solution, and 2594 day fo r 
two others. In their latest work IWittenmver et ah (2009), 
their best-fit 2-planet model now calls for P2 = 9660 days. 
In this paper we present a Bayesian analysis of the latest 
Lick observatory measu rements and a combined Lick plus 
McDonald Observatory jWittenmver et al.r 2009) data set. 

We also report on an investigation of the behavior of the 
Bayesian HMCMC Kepler periodogram to noise. The noise 
data sets were formed by randomly interchanging velocity 
measurements. 



2 THE ADAPTIVE HYBRID MCMC 

The adaptive hybrid MCMC (HMCMC) is a very general 
Bayesian nonlinear model fitting program. After specifying 
the nonlinear model, data and priors, Bayes theorem dic- 
tates the target joint probability distribution for the model 
parameters which can be very complex. To compute the 
marginals for any subset of the parameters it is necessary 
to integrate the joint probability distribution over the re- 
maining parameters. In high dimensions, the principle tool 
for carrying out the integrals is Markov chain Monte Carlo 
based on the Metropolis algorithm. The greater efficiency 
of an MCMC stems from its ability, after an initial burn-in 
period, to generate samples in parameter space in direct pro- 
portion to the joint target probability distribution. In con- 
trast, straight Monte Carlo integration randomly samples 
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the parameter space and wastes most of its time sampling 
regions of very low probability. 

An important feature that prevents the HMCMC from 
becoming stuck in a local probability maximum is parallel 
tempering. Multiple MCMC chains are run in parallel. The 
joint probability density distribution for the parameters (X) 
of model Mi, for a particular chain, is given by 

p(X\D,Mi,I,p) xp(X\Mi,I) xp( J D|A,M,,/) /3 . (1) 

Each MCMC chain corresponding to a different p, with the 
value of P ranging from zero to 1. When the exponent /3 = 1, 
the term on the LHS of the equation is the target joint prob- 
ability distribution for the model parameters, p(X\D, Mi, I). 
It is the posterior probability of a particular choice of param- 
eter vector, X, given the data represented by D, the model 
choice Mi, and the prior information 7. In general, the model 
parameter space of interest is a continuum so p(X\D, Mi, I) 
is a probability density distribution. The first term on the 
RHS of the equation, p(X\Mi, I), is the prior probability 
density distribution of X, prior to the consideration of the 
current data D. The second term, p(D\X Mi, I), is called 
the likelihood and it is the probability that we would have 
obtained the measured data D for this particular choice of 
parameter vector X, model Mi, and prior information I. 
At the very least, the prior information, /, must specify 
the class of alternative models (hypotheses) being consid- 
ered (hypothesis space of interest) and the relationship be- 
tween the models and the data (how to compute the likeli- 
hood). For f urther details of t he likelihood function for this 
problem see ICregorvl ([2005b). In many situations the log 
of the likelihood is simply proportional to the familiar \ 2 
statistic. If we later acquire another data set D' then the 
new prior, p(X\Mi, I'), is equal to our previous posterior, 
p(X\D,M ly I), i.e., I' = I,D. An exponent = 0, yields 
a broader joint probability density equal to the prior. The 
reciprocal of /? is analogous to a temperature, the higher the 
temperature the broader the distribution. 

For parameter estimation purposes 8 chains 
{fi = {0.09,0.13,0.20,0.29,0.39,0.52,0.72,1.0}) were em- 
ployed. At an intervals of 10 iterations, a pair of adjacent 
chains on the tempering ladder are chosen at random and 
a proposal made to swap their parameter states. A Monte 
Carlo acceptance rule determines the probability for the 
proposed swap to occur (e.g., Gregory 2005a, eq. 12.12). 
This swap allows for an exchange of information across the 
population of parallel simulations. In low /9 (higher tem- 
perature) simulations, radically different configurations can 
arise, whereas in higher /? (lower temperature) states, a con- 
figuration is given the chance to refine itself. The lower f3 
chains can be likened to a series of scouts that explore the 
parameter terrain on different scales. The final samples are 
drawn from the ft = 1 chain, which corresponds to the de- 
sired target probability distribution. For (5 < 1, the distri- 
bution is much flatter. The choice of /3 values can be checked 
by computing the swap acceptance rate. When they are too 
far apart the swap rate drops to very low values. 

Each parallel chain employs the Metropolis algorithm. 
At each iteration a proposal to jump to a new location in 
parameter space is generated from independent Gaussian 
proposal distributions (centered on the current parameter 
location), one for each parameter. In general, the cr's of 
these Gaussian proposal distributions are different because 
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Figure 1. Schematic of the operation of the adaptive hybrid MCMC algorithm. 



the parameters can be very different entities. Also if the cr's 
are chosen too small, successive samples will be highly cor- 
related and will require many iterations to obtain an equi- 
librium set of samples. If the cr's are too large, then pro- 
posed samples will very rarely be accepted. The process of 
choosing a set of useful proposal cr's when dealing with a 
large number of different parameters can be very time con- 
suming. In parallel tempering MCMC, this problem is com- 
pounded because of the need for a separate set of Gaussian 
proposal cr's for each chain (different tempering levels). This 
process is auto mated by an innovative two st age statistical 
control system l|Gregory||2007bl . lGregory||2009r ) in which the 
error signal is proportional to the difference between the 
current joint parameter a cceptance rate and a target accep- 
tance rate, typically 25% l|Roberts et al.|[l997h . A schematic 
of the full adaptive control system (CS) is shown in Figure 
1. 

The first stage CS, which involves annealing the set of 
Gaussian proposal distribution cr's, was described in Gre- 
gory 2005a. An initial set of proposal cr's (~ 10% of the 
prior range for each parameter) are used for each chain. Dur- 
ing the major cycles, the joint acceptance rate is measured 
based on the current proposal cr's and compared to a target 
acceptance rate. During the minor cycles, each proposal cr is 
separately perturbed to determine an approximate gradient 
in the acceptance rate for that parameter. The cr's are then 
jointly modified by a small increment in the direction of this 
gradient. This is done for each of the parallel chains. Pro- 
posals to swap parameter values between chains are allowed 
during major cycles but not within minor cycles. 

The annealing of the proposal cr's occurs while the 



MCMC is homing in on any significant peaks in the tar- 
get probability distribution. Concurrent with this, another 
aspect of the annealing operation takes place whenever the 
Markov chain is started from a location in parameter space 
that is far from the best fit values. This automatically arises 
because all the mode l s consi dered incorporate an extra ad- 
ditive noise iGregorvl (|2005bt ). for reasons discussed in Sec- 
tion [3] whose probability distribution is Gaussian with zero 
mean and with an unknown standard deviation s. When the 
\ 2 of the fit is very large, the Bayesian Markov chain au- 
tomatically inflates s to include anything in the data that 
cannot be accounted for by the model with the current set of 
parameters and the known measurement errors. This results 
in a smoothing out of t he de t ailed structure in the \ 2 surface 
and, as pointed out bv lFordl(|2006l ). allows the Markov chain 
to explore the large scale structure in parameter space more 
quickly. The chain begins to decrease the value of s as it 
settles in near the best-fit parameters. An example of this is 
shown in Figure [2] In the early stages s is inflated to around 
38 m s _1 and then decays to a value of « 4 m s" 1 over the 
first 9,000 iterations. This is similar to simulated annealing, 
but does not require choosing a cooling scheme. 

Although the first stage CS achieves the desired joint ac- 
ceptance rate, it often happens that a subset of the proposal 
cr's are too small leading to an excessive autocorrelation in 
the MCMC iterations for these parameters. Part of the sec- 
ond stage CS corrects for this. The goal of the second stage 
is to achieve a set of proposal cr's that equalizes the MCMC 
acceptance rates when new parameter values are proposed 
separately and achieves the desired acceptance rate when 
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Figure 2. The upper panel is a plot of the Logio [Prior X Likeli- 
hood] versus MCMC iteration. The lower panel is a similar plot 
for the extra noise term s. Initially s is inflated and then rapidly 
decays to a much lower level as the best fit parameter values are 
approached. 



ter set can be treated as a set of genes. In the present version, 
one gene consists of the parameter set that specify one or- 
bit. On this basis, a three planet model has three genes. At 
any iteration there exist within the CS the most probable 
parameter set to date X max , and the most probable param- 
eter set from the 8 chains for the most recent iteration X CUI . 
At regular intervals (user specified) each gene from X CUI 
is swapped for the corresponding gene in -X max . If either 
substitution leads to a higher probability it is retained and 
X max updated. The effectiveness of this operation can be 
tested by comparing the number of times the gene crossover 
operation gives rise to a new value of X max compared to 
the number of new X max arising from the normal parallel 
tempering MCMC iterations. The gene crossover operations 
prove to be very effective, and give rise to new X max values 
~ 3 times more often than MCMC operations. Of course, 
most of these swaps lead to very minor changes in probabil- 
ity but occasionally big jumps are created. 

Gene swaps from X CUI 2, the parameters of the second 
most probable current chain, to X max are also utilized. This 
gives rise to new values of X max at a rate approximately half 
that of swaps from X CUT to X max . Crossover operations at 
a random point in the entire parameter set did not prove 
as effective except in the single planet case where there is 
only one gene. Further experimentation with this concept is 
ongoing. 



they are pro posed jointly. D etails of the second stage CS 
were given in lGregorv| |2007b. 

The first stage is run only once at the beginning, but 
the second stage can be executed repeatedly, whenever a sig- 
nificantly improved parameter solution emerges. Frequently, 
the burn-in period occurs within the span of the first stage 
CS, i.e., the significant peaks in the joint parameter proba- 
bility distribution are found, and the second stage improves 
the choice of proposal <r's based on the highest probability 
parameter set. Occasionally, a new higher (by a user speci- 
fied threshold) target probability parameter set emerges af- 
ter the first two stages of the CS are completed. The control 
system has the ability to detect this and automatically re- 
activate the second stage. In this sense the CS is adaptive. 
If this happens the iteration corresponding to the end of the 
control system is reset. The useful MCMC simulation data 
is obtained after the CS are switched off. 

The adaptive capability of the control system can be 
appreciated from an examination of Figure 1. The upper left 
portion of the figure depicts the MCMC iterations from the 8 
parallel chains, each corresponding to a different tempering 
level ft as indicated on the extreme left. One of the outputs 
obtained from each chain at every iteration (shown at the 
far right) is the log prior + log likelihood. This information 
is continuously fed to the CS which constantly updates the 
most probable parameter combination regardless of which 
chain the parameter set occurred in. This is passed to the 
"Peak parameter set" block of the CS. Its job is to decide if a 
significantly more probable parameter set has emerged since 
the last execution of the second stage CS. If so, the second 
stage CS is re-run using the new more probable parameter 
set which is the basic adaptive feature of the CS. 

The CS also includes genetic algorithm block which is 
shown in the bottom right of Figure 1. The current parame- 



3 DATA AND ANALYSIS 

Our initial analysis was based on data obtained at the Lick 
observatory and spans a period of 21.6 years. The data are 
listed in Tables [T] and [2] In addition to the observation time, 
radial velocity {RV), and velocity error (ARV), the detector 
dewar number used is also included. We originally analyzed 
the data ignoring possible residual velocity offsets associated 
with dewar changes (Case A). To investigate how robust the 
results were we subsequently repeated the analysis incorpo- 
rating the dewar velocity offsets as additional unknown pa- 
rameters (Case B). In Case A the data from all 6 dewars are 
used. For Case B we excluded dewar 1 because with only 
a single measurement the analysis is unable to separate the 
offset from the model velocity contribution which reduces 
the time base by 235 days. Results for the two cases follow 
in subsequent sections labeled acc ordingly. In Section [6] wc 
extend the analysis to include the IWittenmver et alj (|2009h 
data from the 9.2 m Hobby-Eberly Telescope (HET) and 2.7 
m Harlam J. Smith (HJS) telescopes of the McDonald Ob- 
servatory. In the rest of this section we describe the model 
fitting equations and the selection of priors for the model pa- 
rameters. We also characterize a noise induced eccentricity 
bias that leads to a second choice for an eccentricity prior. 

We have investigated the 47 UMa data using models 
ranging from a single planet to five planets. For a one planet 
model the predicted radial velocity is given by 

v(U) = V + K[cos{e(U + xP) + u} + ecosu], (2) 

and involves the 6 unknown parameters 

V = a constant velocity. 

K — velocity semi-amplitude. 

P = the orbital period. 

e = the orbital eccentricity. 
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Table 1. Radial velocities (RV) for 47 UMa. The ARV column gives the RV uncertainty and the next column gives 
the detector dewar number. 
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Table 2. Radial velocities (RV) for 47 UMa. The ARV column gives the RV uncertainty and the next column gives 
the detector dewar number. 
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lo = the longitude of periastron. 

X = the fraction of an orbit, prior to the start of data 
taking, that periastron occurred at. Thus, \P — the number 
of days prior to ti — that the star was at periastron, for 
an orbital period of P days. 

9(ti + \P) = the true anomaly, the angle of the star in 
its orbit relative to periastron at time ti. 

We utilize this form of the equation because we obtain 
the dependence of 9 on ti by solving the conservation of 
angular momentum equation 

d9 _ 2-K[l + ecos6(U+ X P)} 2 _ , , 

dt p(l_ e 2)3/2 u ' W 

Our algorithm is implemented in Mathematica and it proves 
faster for Mathematica to solve this differential equation 
than solve the equations relating the true anomaly to the 
mean anomaly via the eccentric anomaly. Mathematica gen- 
erates an accurate interpolating function between t and 9 
so the differential equation does not need to be solved sepa- 
rately for each ti. Evaluating the interpolating function for 
each ti is very fast compared to solving the differential equa- 
tion, so the algorithm should be able to handle much larger 
samples of radial velocity data than those currently available 
without a significant increase in computational time. For ex- 
ample, an increase in the data by a factor of 6.5 resulted in 
only an 18% increase in execution t ime. 

As described in more detail in lGregorvll2007al . we em- 
ployed a re-parameterization of \ an d w to improve the 
MCMC convergence speed motivated by the work of Ford 
(2006). The two new parameters are tj) = 2irx + and 
4> = 2nx — w- Parameter tj) is well determined for all ec- 
centricities. Although cj> is not well determined for low ec- 
centricities, it is at least orthogonal to the ip parameter. We 
use a uniform prior for ip in the interval to 4-7T and uni- 
form prior for (p in the interval — 2n to +2ir. This insures 
that a prior that is wraparound continuous in (x^) maps 
into a wraparound continuous distribution in (ip,<p). The 
big (ip, (j>) square holds two copies of the probability patch 
in (x, w) which doesn't matter. What matters is that the 
prior is now wraparound continuous in (ip, <f>). 

In a Bayesian analysis we need to specify a suitable prior 



for each parameter. These are tabulated in Table [3] For the 
current problem, the prior given in Equation[T]is the product 
of the individual parameter priors. Detailed argume nts for 
the choice of each p rior were given in Gregory 2007a. 

iGregorvl l2007al discussed two different strategies to 
search the orbital frequency parameter space for a multi- 
planet model: (i) an upper bound on fx ^ f% ^ • • • < f n 
is utilized to maintain the identity of the frequencies, and 
(ii) all fi are allowed to roam over the entire frequency 
range and the parameters re-labeled afterwards. Case (ii) 
was found to be significantly more successful at converging 
on the highest posterior probability peak in fewer iterations 
during repeated blind frequency searches. In addition, case 
(ii) more easily permits the identification of two planets in 
1:1 resonant orbits. We adopted approach (ii) in the current 
analysis. 

All of the models considered in this paper incorporate 
an extra noise parameter, s, that can allow for any addi- 
tional noise beyond the known measurement uncertainties^. 
We assume the noise variance is finite and adopt a Gaussian 
distribution with a variance s 2 . Thus, the combination of 
the known errors and extra noise has a Gaussian distribution 
with variance = of +s 2 , where at is the standard deviation of 
the known noise for i th data point. For example, suppose that 
the star actually has two planets, and the model assumes 
only one is present. In regard to the single planet model, the 
velocity variations induced by the unknown second planet 
acts like an additional unknown noise term. Other factors 
like star spots and chromospheric activity can also con- 
tribute to this extra velocity noise term which is often re- 
ferred to as stellar jitter. Several researchers have attempted 
to estimate stellar jitter for individual st ars based on sta- 
tistical correlations wi t h observables (e.g.. lSaar fc Donahue! 
1 19971 . ISaar et al.Hl998l . IWrightl I2005T) . In general, nature is 
more complicated than our model and known noise terms. 



2 In the absence of detailed knowledge of the sampling distribu- 
tion for the extra noise, we pick a Gaussian because for any given 
finite noise variance it is the distribution with the largest uncer- 
tainty as me asured by the entropy, i.e., the maximum entropy 
distribution j Javnedll957l . lGregorvll2005al section 8.7.4.) 



Three Planets in 47 UMa 7 



Table 3. Prior parameter probability distributions. 



Parameter 



Prior 



Lower bound Upper bound 



Orbital frequency 

Velocity Ki 
(m s" 1 ) 

V (m s" 1 ) 

ej Eccentricity 



p(ln /i,ln/ 2 , ■ • • In f n \M n ,I) = [ Mf ^jf L )\r, 
(n =number of planets) 

Modified Jeffreys^ 

(k+Kq)- 1 



Uniform 

a) Uniform 

b) Ecc. noise bias correction filter 



1/1.5 d 1/1000 yr 



(K = l) Xmax ~F= 



ATmax = 2129 



l 

0.99 



uji Longitude of Uniform 2ir 

periastron 

s Extra noise (m s" 1 ) ( f +so) 1 , (so = 1) K m 

1- 



a Since the prior lower limits for K and s include zero, we used a modified Jeffreys prior of the form 

p(X\M, I) = — i— - 1 r- (4) 

X + X m ( X+ ^) 

For X -C Xq, p(X\M, I) behaves like a uniform prior and for X ^> Xq it behaves like a Jeffreys prior. The 



In (l - 



y j term in the denominator ensures that the prior is normalized in the interval to X n 



Marginalizing s has the desirable effect of treating anything 
in the data that can't be explained by the model and known 
measurement errors as noise, leading to conservative esti- 
mates of orbital parameters. See Sections 9.2.3 and 9.2.4 of 
iGregorvl (|2005al ) for a tutorial demonstration of this point. 
If there is no extra noise then the posterior probability dis- 
tribution for s will peak at s — 0. The upper limit on s was 
set equal to -Kmax- We employed a modified Jeffrey's prior 
for s with a knee, sq = lm s" 1 . 

We used two different choices of priors for eccentricity, 
a uniform prior and eccentricity noise bias correction filter 
that is described in the next section. 



3.1 Eccentricity Bias 

When searching for low amplitude orbits any true signal 
has to compete against spurious orbital signals arising from 
noise. It was observed that the majority of the probability 
peaks detected in low signal-to-noise residuals exhibited high 
eccentricities. The upper panel in Figure[3]shows MCMC pe- 
riod parameter versus iteration for a 1 planet model fit to 
residuals (with randomized velocity values) from a 3 planet 
model fit. The lower panel is the same for the eccentric- 
ity parameter. The HMCMC finds many probability peaks 
spread over the full period range. There is no significance to 
the concentration of periods around 100 and 1500 days as 
the location of period concentrations changes markedly in 
other realizations of the velocity randomization. The con- 



centration of eccentricity towards higher values is a regu- 
lar feature. The corresponding plot of eccentricity shows a 
preponderance of high eccentricity values. Figure U shows 
a phase plot for one of these high eccentricity orbits which 
provides further insight into why high eccentricities orbits 
are favored. It is clear that for most of the orbit (e = 0.93) 
the predicted shape is relatively fiat providing an agreeable 
fit to points that fluctuate in an uncorrelated noise like fash- 
ion about some mean. Only for a small portion of the orbit 
does the noise have to conspire to give rise to the rapidly 
changing orbital velocity peak. To mimic a circular velocity 
orbit the noise points would have to appear correlated over 
a larger fraction of the orbit. For this reason it is more likely 
that noise will give rise to spurious highly eccentric orbits 
than low eccentricity orbits. 

To explore this effect more quantitatively we analyzed 
a large number of real data sets where the observing times 
were kept fixed but the velocity residual data was randomly 
reorganized. In each trial we fit a one planet orbit model 
which explored eccentricities in the range to 0.99 using 
the one planet Bayesian Kepler periodogram. In the first 
instance the data used was the 5 planet fit residuals for 55 
Cancri. The data for 55 Cancri was a mixture of Lick and 
Keck observatory data. When the residual velocities were 
randomized the error associated with a particular velocity 
was shifted with its velocity because the quoted errors were 
very different for the two observatories. The red curve in 
the left panel of Figure [5] is the average of 5 different 55 
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Figure 3. The upper panel shows MCMC period parameter ver- 
sus iteration for a 1 planet model fit to residuals (with randomized 
velocity values) from a 3 planet model fit. The lower panel is the 
same for the eccentricity parameter. 



Cancri randomized residuals trials. The green curve is the 
average of 4 trials of randomized residuals from a 2 planet 
47 UMa model fit, and the blue curve the average of 8 trials 
of randomized residuals from a 3 planet 47 UMa model fit. 
All three curves are very similar and indicate a strong noise 
induced eccentricity bias towards high eccentricities. 

To increase the chance of detecting and defining the 
parameters of low and moderate eccentricity orbits we have 
constructed an eccentricity noise bias correction filter from 
the reciprocal of the average of the three eccentricity bias 
curves just mentioned. The lower panel of Figure[S]shows the 
best fit polynomial (dashed curve) to the reciprocal of the 
mean of the three eccentricity bias curves (red points). After 
normalizing the best fit polynomial so the integral is equal 
to unity over the search range (e = to 0.99), we obtain the 
eccentricity noise bias correction filter (solid black curve). 
This becomes our second option for a choice of prior for 
eccentricity. The probability density function for this filter 
(solid black curve) is given by 



pdf(e) = 1.3889- 



-1.5212e 2 +0.53944e 3 - 



1.6605(e-0.24821) 8 .(5) 



On the basis of our understanding of the mechanism un- 
derlying the noise induced eccentricity bias, we expect the 
eccentricity prior filter to be generally applicable to searches 
for low amplitude orbital signals in other precision radial ve- 
locity data sets. 

An obvious further question that remains to be explored 
is to what extent the observed distribution of published or- 
bital eccentricities is influenced by such a bias. 
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Figure 4. A typical high eccentricity orbit (in this case e = 0.93) 
found from an MCMC fit of a 1 planet model to residuals with 
randomized velocities. The upper panel shows the raw data points 
plotted versus two cycles of period phase and the lower panel 
shows binned averages. 




0.4 0.6 
Eccentricity 

Figure 5. The upper panel shows the marginal probability densi- 
ties for the eccentricity parameter obtained from MCMC 1 planet 
fits to randomized residuals from 47 UMa 2 planet model fits 
(green), 3 planet (blue), and 55 Cancri 5 planet (red) fits. The 
green curve is the average of 4 trials, the blue curve is the aver- 
age of 8 trials and the red curve is the average of 5 trials. The 
lower panel shows the best fit polynomial (dashed curve) to the 
reciprocal of the mean of the three eccentricity bias curves (red 
points). After normalization this yields the eccentricity noise bias 
correction filter (solid black curve). 
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Figure 6. Panel (a) shows the Lick Observatory observations of 
47 UMa. Panel (b) shows the final Case A best 3 planet model fit 
compared to the data and panel (c) shows the residuals. 



4 RESULTS (CASE A) 

For Case A, the dewar velocity offsets with respect to our 
reference dewar # 24 are assumed to be zero. 




0.5 1.0 1.5 

Iterations (x 10 6 ) 




0.5 1 .0 

Iterations (x 10 s ; 



Figure 7. Plot of Logio [Prior X Likelihood] (upper) and period 
(lower) versus MCMC iteration for a 2 planet model. 
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Figure 8. A plot of eccentricity versus period for the 2 planet fit 
(Case A). 



4.1 Parameter estimation 

In this section we present the results of an exploration of 
the 47 UMa data with the multi-planet HMCMC Kepler 
periodogram starting with a one planet model and extending 
to a five planet model. The data for 47UMa is shown in 
Figure [6] panel (a). Panel (b) shows our final best 3 planet 
model fit compared to the data and panel (c) shows the 
residuals. 

The one planet model turned up the 1080 day period 
which is clearly visible by eye in the raw data. We do not 
show any results for that model except to compute the 
marginal likelihood for model selection purposes which is 
presented in Section \4. 2 1 

Figure [7| shows a plot of Logio [Prior x Likelihood] (up- 
per) and period (lower) versus HMCMC iteration (every 
200 th point) for a 2 planet model. The starting periods of 4.7 
and 1080 days are shown on the left hand side of the lower 
plot at a negative iteration number. The burn-in period of 
approximately 70,000 iterations is clearly discernable. 

Figure [8] shows a plot of eccentricity versus period for a 



sample of the HMCMC parameter samples for the 2 planet 
model. Since the duration of the data set is only 7906 days, it 
is not surprising that uncertainties on the parameters of the 
second orbit are very large. On the basis of a 2 planet model, 
the parameters of the second planet are P 2 = 7952±iHd and 
e-2 — 0.431qq8- It is clear that e 2 has a low eccentricty tail 
which reaches zero for a value of P 2 « 9500d. This agrees 
with t he value of P 2 = 9660d found by IWittenmver et all 
(2009) in their best-fit 2-plane t model where they f ixed e 2 = 
0.005, the values proposed bv lFischer et alj (|2002l ). 

Figure [5] shows plots of the 3 period parameters versus 
HMCMC iteration for a 3 planet model □ with Logio [Prior 
x Likelihood] plotted above. A new period of 2300d has 
emerged and the longest period has shifted from 7952d to 
~ lOOOOd and this feature is considerably broader. The 
starting periods of 89, 1080, 7200d are shown on the left at 



3 Note: the HMCMC runs shown here used the eccentricity prior 
based on the eccentricity noise bias correction filter discussed in 
Section 13.11 The results obtained using a uniform eccentricity 
prior are qualitatively the same. 
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Figure 9. Plot of Logio [Prior X Likelihood] (upper) and period 
(lower) versus HMCMC iteration for a 3 planet fit. 
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Figure 10. Plot of period versus HMCMC iteration for a 3 planet 
fit. In this case the start periods were 5, 20, 100 days. 



a negative iteration nu mber. Previou s experience with the 
HMCMC periodogram l|Gregorvll2009l ) indicate that it is ca- 
pable of finding a global peak in a blind search of parameter 
space for a three planet model. Figure [TD] shows the results 
of a blind search starting from 3 very different periods of 5, 
20, lOOd. The algorithm readily finds the same set of final 
periods in both cases. Figure QTJ shows a plot of eccentric- 
ity versus period for a sample of the HMCMC parameter 
samples for the 3 planet model. There is a large uncertainty 
in the eccentricity of the two largest periods which extends 
down to very low eccentricities. 

Figure [T2l shows the marginal probability distributions 
for the periods, eccentricities and K values for the three or- 
bits found. The tenth plot is s, the a of the added white 
noise term. A summary of the 3 planet model parameters 
and their uncertainties are given in Tabled] The parameter 
value listed is the median of the marginal probability dis- 
tribution for the parameter in question and the error bars 




3000 10000 30000 

Periods 



Figure 11. A plot of eccentricity versus period for the 3 planet 
HMCMC (Case A). 



identify the boundaries of the 68.3% credible region 0. The 
value immediately below in parenthesis is the maximum a 
posteriori (MAP) value, the value at the maximum of the 
joint posterior probability distribution. It is not uncommon 
for the MAP value to fall close to the borders of the cred- 
ible region. In one case, the period of the third planet, the 
MAP value falls outside the 68.3% credible region which is 
one reason why we prefer to quote median values as well. 
The marginal for P3 is so asymmetric we also give the mode 
which is 9991 days. The semi-major axis and Msini values 
are derived from the mo del parameters assu ming a stellar 
mass of 1.063+o;°29 M i|Takeda et al.ll2007h . The quoted 
errors on the semi-major axis and M sin i include the uncer- 
tainty in the stellar mas s. 

The lGelman-Rubinl |l992) statistic is typically used to 
test for convergence of the parameter distributions. In paral- 
lel tempering MCMC, new widely separated parameter val- 
ues are passed up the line to the j3 = 1 simulation and 
are occasionally accepted. Roughly every 100 iterations the 
j3 = 1 simulation accepts a swap proposal from its neigh- 
boring simulation. The final j3 = 1 simulation is thus an 
average of a very large number of independent /3 — 1 sim- 
ulations. What we have done is divide the 13 = 1 iterations 
into 12 equal time intervals and inter-compared the 12 dif- 
ferent essentially independent average distributions for each 
parameter using a Gelman- Rubin test. For all of the three 
planet model parameters the Gelman-Rubin statistic was 
< 1.07. 

Figure [T3] shows a plot of eccentricity versus period for 
a 4 planet model. A well defined fourth period of 37O.8I2Q 
days and eccentricity of 0.571q'j5 was detected in repeated 
HMCMC trials. The amplitude was K = 5.0ti;?m s" 1 . The 
significance of this period is discussed further in Sections l4.2l 
and [6] 

Finally, a 5 planet model was also attempted. In addi- 
tion to the 4 periods found by the 4 planet model, a variety 



4 In practice, the probability density for any parameter is repre- 
sented by a finite list of values pi representing the probability in 
discrete intervals X. A simple way to compute the 68.3% credible 
region, in the case of a marginal with a single peak, is to sort 
the pi values in descending order and then sum the values un- 
til they approximate 68%, keeping track of the upper and lower 
boundaries of this region as the summation proceeds. 
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Figure 12. A plot of parameter marginal distributions for a 3 planet HMCMC (Case A). 
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Figure 13. A plot of eccentricity versus period for the 4 planet 
HMCMC (Case A). 
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Figure 14. A plot of eccentricity versus period for a 3 planet 
HMCMC fit of the 3 planet simulation. 



of probability peaks at other periods were observed but none 
were deemed significant. 



4-1.1 Simulation test 

As a test of our overall methodology, we simulated data for 
a 3 planet model based on the MAP values from the fit to 



the real data for the Case A analysis. The data was sampled 
at the real observation times and had added independent 
Gaussian noise with a a — yj (e^) 2 + s 2 , where is the 
quoted measurement error for the \ th point and s, the extra 
noise parameter, was 4.4m s _1 . Figure [14] shows a plot of 
eccentricity versus period for a sample of the HMCMC pa- 
rameter samples for the 3 planet model fit to the simulated 
data set. Again, the starting period values for the HMCMC 
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Table 4. Three planet model parameter estimates (Case A). 



Parameter 


planet 1 


planet 2 


planet 3 


P (d) 


1079. 6ljg 
(1079.2) 


2319^76 
(2278) 


-, 00 ,,,-1-4030 

13346+™^ 
(21342) 
mode= 9991 


K (m s" 1 ) 


Bo.i±i:i 

(50.3) 


J - ± -i.o 
(9.6) 


13.7±i"| 
(13.2) 


e 


0.014+;°°° 
(0.012) 


0.33l;? 7 
(0.48) 


0.29tj} 
(0.44) 


u (deg) 


380±S 
(345) 


222+J} 
(222) 


162±« 
(111) 


a (au) 


2 10+ 02 
z,iu -.02 

(2.10) 


3.60+;" 
(3.46) 


(15.4) 


M sini (Mj) 


2.63+/°? 
(2.64) 


(0.566) 


i.58± : g 

(1.69) 


Periastron 
passage 


11967+* 5 * 
(11943) 


11914.6+^6 
(11930) 


12655+^4 
(12047) 



(JD - 2,440,000) 



were 5, 20, and 100 days, a long way from the expected val- 
ues. Comparison with Figure [TT] indicates that the results 
for the actual data and 3 planet simulation are qualitatively 
very similar. 

To test whether the fourth period in the Lick data (pe- 
riod = 370.8?J2.o days) is a window function artifact of the 
sampling times, we analyzed two 3 planet simulations with a 
4 planet model. In both cases the HMCMC found the 3 pe- 
riods expected from the simulation. No well defined fourth 
period was found and the peak amplitude was K = 3m 
s _1 compared with a K — 5m s _1 for the real data set. This 
suggests that the fourth period is not simply a window func- 
tion artifact. However, later HMCMC fits of a combination 
of Lick and Mcdonald Observatory data did not confirm this 
period. 

4.2 Model selection 

On of the great strengths of Bayesian analysis is the built- 
in Occam's razor. More complicated models contain larger 
numbers of parameters and thus incur a larger Occam 
penalty, which is automatically incorporated in a Bayesian 
model se lection analysis in a quantitative fashion (see for 
example, lGregorvl[2005al . p. 45). The analysis yields the rel- 
ative probability of each of the models explored. 

To compare the posterior probabilities of the i th planet 
model to the one planet models we need to evaluate the 
odds ratio, On = p(Mi\D, I)/p(M 1 \D, I), the ratio of the 
posterior probability of model Mi to model Mi. Application 
of Bayes's theorem leads to, 

p(Mj\I) p(D\Mj,I) = p(Mj\I) 
12 p(M x \I) p(D\M u I) ~p(Mi|J) 12 K) 

where the first factor is the prior odds ratio, and the second 
factor is called the Bayes factor, Ba. The Bayes factor is 



the ratio of the marginal (global) likelihoods of the models. 
The marginal likelihood for model Mi is given by 

p(D\Mi,I) = JdXp(X\Mi,I) xp(D\X,Mi,I). (7) 

Thus Bayesian model selection relies on the ratio of marginal 
likelihoods, not maximum likelihoods. The marginal likeli- 
hood is the weighted average of the conditional likelihood, 
weighted by the prior probability distribution of the model 
parameters and s. This procedure is referred to as marginal- 
ization. 

The marginal likelihood can be expressed as the prod- 
uct of the maximum like liho od and the Occa m penalty (see 
iGregorv fc Loredolll992l and lGregorvll2005al . page 48). The 
Bayes factor will favor the more complicated model only 
if the maximum likelihood ratio is large enough to over- 
come this penalty. In the simple case of a single parameter 
with a uniform prior of width AX, and a centrally peaked 
likelihood function with characteristic width 5X, the Oc- 
cam factor is ~ SX/ AX. If the data is useful then generally 
SX <C AX. For a model with m parameters, each parame- 
ter will contribute a term to the overall Occam penalty. The 
Occam penalty depends not only on the number of param- 
eters but also on the prior range of each parameter (prior 
to the current data set, D), as symbolized in this simplified 
discussion by AX. If two models have some parameters in 
common then the prior ranges for these parameters will can- 
cel in the calculation of the Bayes factor. To make good use 
of Bayesian model selection, we need to fully specify priors 
that are independent of the current data D. The sensitiv- 
ity of the marginal likelihood to the prior range depends on 
the shape of the prior and is much greater for a un iform 
prior than a Jeffreys prior (e.g., see ICregorvl 120053 , page 
61). In most instances we are not particularly interested in 
the Occam factor itself, but only in the relative probabili- 
ties of the competing models as expressed by the Bayes fac- 
tors. Because the Occam factor arises automatically in the 
marginalization procedure, its effect will be present in any 
model selection calculation. Note: no Occam factors arise in 
parameter estimation problems. Parameter estimation can 
be viewed as model selection where the competing models 
have the same complexity so the Occam penalties are iden- 
tical and cancel out. 

The MCMC algorithm produces samples which are in 
proportion to the posterior probability distribution which 
is fine for parameter estimation but one needs the propor- 
tional ity constant for es timating the model marginal likeli- 
hood. IClvde et al.l (|2006l ) recently reviewed the state of tech- 
niques for model select ion from a statistics perspective and 
iFord fc Gregory! (|2006l ) have evaluated the performance of 
a variety of marginal likelihood estimators in the extrasolar 
pla net contex t. 

Gregory l|2007d ). in the analysis of velocity data for HD 
11964, compared the results from three marginal likelihood 
estimators: (a) parallel tempering, (b) ratio estimator, and 
(c) restricted Monte Carlo. Monte Carlo (MC) integration 
can be very inefficient in exploring the whole prior param- 
eter range because it randomly samples the whole volume. 
The fraction of the prior volume of parameter space contain- 
ing significant probability rapidly declines as the number of 
dimensions increases. For example, if the fractional volume 
with significant probability is 0.1 in one dimension then in 
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17 dimensions the fraction might be of order 10 -17 . In re- 
stricted MC integration (RMC) this should be much less of a 
problem because the volume of parameter space sampled is 
restricted to a region delineated by the outer borders of the 
marginal distributions of the parameters. For HD 11964, the 
three methods were compared for 1, 2 and 3 planet models. 
For the one planet model all three methods agreed within 
15%. For the two planet model the three methods agreed 
within 28% with the RMC giving the lowest estimate. For 
the three planet model the estimates were very different. 
The RMC estimate was 16 time smaller than the PT esti- 
mate and the ratio estimator was 18 times larger than the 
PT estimate. The PT method is very compute intensive. 
For a three planet model 40 tempering levels and 10 7 itera- 
tions were required. The problem becomes more difficult for 
larger numbers of planets. Thus for 3 or more planet models 
accurately computing the marginal likelihood is a very big 
challenge. 

In this work we consider only RMC marginal likelihood 
estimates. This method is expected to underestimate the 
marginal likelihood in higher dimensions and this underesti- 
mate is expected to become worse the larger the number of 
model parameters, i.e. increasing number of planets. When 
we conclude, as we do, that the RMC computed odds in fa- 
vor of the three planet model compared to the two planet 
model is ~ 10 17 we mean that the true odds is 10 17 . 

In earlier work, we defined the outer boundary of pa- 
rameter space for RMC integration based on the 99% cred- 
ible region. One problem is that if there is a significant con- 
tribution to the integral within say the 30% credible region, 
the volume in this region can be such a small fraction of the 
total that no random sample lands in that region. In this 
work we use a nested version of RMC integration. Multi- 
ple boundaries were constructed based on credible regions 
ranging from 30% to ^ 99%, as needed. We are then able 
to compute the contribution to the total RMC integral from 
each nested interval and sum these contributions. For ex- 
ample, for the interval between the 30% and 60% credible 
regions, we generate random parameter samples within the 
60% region and reject any sample that falls within the 30% 
region. Using the remaining samples we can compute the 
contribution to the RMC integral from that interval. 

The left panel of Figure [15] shows the contributions 
from the individual intervals for 5 repeats of the nested 
RMC evaluation for the 2 planet model. The right panel 
shows the summation of the individual contributions ver- 
sus the volume of the credible region. The credible region 
listed as 9995% is defined as follows. Let Xugg and Xl99 
correspond to the upper and lower boundaries of the 99% 
credible region, respectively, for any of the parameters. Sim- 
ilarly, Xu95 and Xl95 are the upper and lower boundaries of 
the 95% credible region for the parameter. Then X1J9995 = 

XjJ99 + (Xu99 — Xu95) and X.L9995 = Xl99 + (Xl99 — Xl95)- 

Similarly, Xj/9984 = Xu99 + (Xu99 — Xus4)- 

Table [5] gives the Marginal likelihood estimates, Bayes 
factors and false alarm probabilities for 0, 1, 2, 3, and 4 
planet models which are designated Mo, •■•,Ma- The last 
two columns list the MAP value of extra noise parameter, s, 
and the RMS residual. For each model the RMC calculation 
was repeated 5 times and the quoted errors give the spread 
in the results, not the standard deviation. The Bayes factors 
that appear in the third column are all calculated relative 



to model 2. Examination of a plot like the one shown in 
Figure [T5l but for the 4 planet model, indicates that RMC is 
probably seriously underestimating the marginal likelihood. 
A better method of computing this quantity is sorely needed. 

We can readily convert the Bayes factors to a Bayesian 
False Alarm Probability (FAP). For example, in the context 
of claiming the detection of a 3 planet model the FAP is the 
probability that there are actually 2 or less planets. 



FAP = ^^(prob.of i planets) 



(8) 



If we assume a priori (absence of the data) that the 
prob of 1 planet model = prob. of 2 planet model — etc., 
then probability of each model is related to the Bayes factors 
by 

Bi2 



p(Mi I D, I) 



E5r b j2 



(9) 



where JV mo d is the total number of models considered, and 
of course B22 = 1. Given the Bayes factors in Tableland 
substituting into equation [8] gives 



FAP : 



(-B02 + B\2 + B2 



10" 



(10) 



For the 3 planet model we obtain a very low FAP ~ 10 -17 . 
The Bayesian false alarm probabilities for 1, 2, 3, and 4 
planet models are given in the fourth column of Table [5] 

In the context of claiming the detection of a 4 planet 
model the Bayesian false alarm probability is ~ 0.5. This is 
very high and does not justify a claim for the detection of 
a fourth planet. The fourth period is also suspiciously close 
to one year to be of concern. 



5 RESULTS (CASE B) 

For Case B, we incorporated 4 additional parameters to 
allow for the unknown residual velocity offsets of dewars 
6, 8, 39, and 18 relative to dewar 24. These are labeled 
Vq, V&, V39, Vi8, where the subscript denotes the detector de- 
war. In a Bayesian analysis these are treated as additional 
nuisance parameters which we can marginalize. Addition- 
ally, since they are of interest to the observers we also pro- 
vide a summary of each residual offset parameter. In the ra- 
dial velocity data processing pipeline every effort was made 
to insure the dewar velocity offsets were allowed for so the 
residuals are expected to be small. For the Case B analysis 
we have assumed a Gaussian prior for each Vi centered on 
zero with a a — 3 km s _1 . 



5.1 Parameter estimation 

In this section we redo the analysis of the 47 UMa data with 
the multi-planet HMCMC Kepler periodogram starting with 
a one planet model and extending to a four planet model. 
The data is same as shown in Figure HJ] panel (a) with the 
exception of the first point corresponding to dewar 1. 

Figure [16] shows a plot of eccentricity versus period for 
a sample of the HMCMC parameter samples for the 2 planet 
model for Case B. The two planet model again favors a sec- 
ond period in the range 8100-15000d (68% credible region) 
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Table 5. Marginal likelihood estimates, Bayes factors and false alarm probabilities for (Case A) 0, 1, 2, 3, and 4 planet models which 
are designated Mo, ■ • ■ , M4. The last two columns list the MAP value of extra noise parameter, s, and the RMS residual. 



Model 


Periods 
(d) 


Marginal 
Likelihood 


Bayes factor 
nominal 


False Alarm 
Probability 


s 

(m s _1 ) 


RMS residual 
(m s — 1 ) 


M 




2.63 X 10" 481 


1Q -127 




34.8 


35.3 




(1080) 


(7.51 ± 0.07) x 10" 394 


10 -39 


10 -88 


11.2 


12.5 


Ah 


(1080, 8000) 


(4.1 ±0.5) x 10" 355 


1.0 


10 -39 


(5.1 


8.1 


M 3 


(1080, 2300, ~ 10000) 


(4^ /5 ) x io-338 


10 17 


ht 17 


4.4 


6.5 


Mi 


(371,1080, 2300, ~ 10000) 


( 4 xI/ 2 ) >< 10 " 338 


10 17 


0.5 


3.7 


6.1 




Log 10 | Restricted Monte Carlo parameter volume] Log 10 | Restricted Monte Carlo parameter volume] 



Figure 15. Left panel shows the contribution of the individual nested intervals to the RMC marginal likelihood for the 2 planet model. 
The right panel shows the integral of these contributions versus the parameter volume of the credible region. Note: only the relative 
values of the units on the vertical axes of these two plots arc meaningful. 
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Figure 16. A plot of eccentricity versus period for the 2 planet 
fit (Case B). 
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Figure 17. A plot of eccentricity versus period for the 3 planet 
HMCMC (Case B). 



with a long higher eccentricity tail extending to much longer 
periods. In Case B the time base is 235 days shorter than 
Case A so the lower eccentricity /lower period end is less 
well defined but otherwise there is general agreement. This 
model was run twice using different starting periods but the 
two planet HMCMC run did not favor a period around 2240 
days even when the two starting periods used were 1078 and 
2240 days, respectively. This is not that surprising given the 
relative sizes of the K values for planets 2 and 3 in Table [4] 

Figure [17] shows a plot of eccentricity versus period for 
a sample of the HMCMC parameter samples for the 3 planet 



model for Case B. Again, we see the emergence of a period 
of ~ 2250 days and the third longer period appears bet- 
ter defined (compared to the 2 planet model) and extends 
to much lower eccentricities. Qualitatively, there is general 
agreement with the Case A results shown in Figure [TT1 The 
dewar residual offset velocities were V§ = 0.07ztlg, ^8 = 
1.712.3, V39 = -3.2l£!, and Vi 8 = -l.ll^ m s" 1 '. 

The 4 planet HMCMC analysis again showed a clear 
fourth period of 372 1 _'i. 3 with an eccentricity of 0.73 ± 0.14. 
We did not compute the marginal likelihood for the 4 planet 
model but based on the Case A results the Bayesian false 
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alarm probability for a 4 planet model is expected to be very 
high. 

5.2 Model selection (Case B) 

We repeated the Bayesian false alarm probability for the 
3 planet model as described in Section 14.21 for the Case B 
analysis which incorporates the dewar residual offset param- 
eters. 



FAP 



(B02 + B12 + B22) 



J2U R 



(11) 



J "2 



The computed Bayes factors are B02 = 1.6 x 10 



, B12 



2.0 x 10~ 28 ,B 2 2 = 1-0, -B 3a = 2.0 x 10 J . This gives a false 
alarm probability of 5.0 x 10~~ 6 . Even though this is much 
greater than the value found in Case A it still argues strongly 
for favoring a 3 planet model. 



6 DISCUSSION 

On the basis of the model selection results we can con- 
clude there is strong evidence for three planets although the 
longest period orbital parameters are still not well defined. 
The results for the Lick only analysis do not rule out low 
eccentricity orbits for all three planets. The major difference 
produced by including the dewar residual offset parameters 
was to reduce the Bayesian false alarm probability for a 3 
planet model from ~ 10~ 17 to to ~ 10~ 5 . A significant part 
of this reduction might be a consequence of the reduced span 
of the data set by 235 days for the Case B analysis. 

Our results appear to be entirely co nsistent with the 
latest analysis of lWittenmver et al.l (|200ST ) . Their best-fit 2- 
planet model now calls for P2 = 9660 days. They note that 
to fit a second planet, the y fixed the pa r amete rs e 2 and u>2 
at the values proposed bv lFischer et al.l (|2002T ): e2 = 0.005 
and l)2 = 127. In our Case A two planet fit (Figure [S|, in 
which all parameters were free, the eccentricity versus period 
plot exhibits a low eccentricity tail which occurs at a period 
between 9000 and 10000 days, directly comparable to their 
9660 day period. The ~ 2300 day period in the Lick data 
only shows up in our 3 planet and higher models. This is 
probably because the longer period signal with a K — 13.8 
m s" 1 dominates over the 2 300 day period signal wi th a 
K — 8.0 m s _1 (see Table © . IWittenmver et all lj2009T l did 
not report any results on fitting a 3 planet model. 

To test this further we combined the Lick data with the 
IWittenmver et all (|2009T ) data from the 9.2 m Hobby-Eberly 
Telescope (HET) and 2.7 m Harlam J. Smith (HJS) tele- 
scopes of the McDonald Observatory. We subtracted initial 
offset velocities of 23.3 and 25.4 m s _1 based on a compari- 
son of plots of the HET and HJS data sets to the Lick data. 
We then included a free parameter for each telescope to al- 
low for an unknown residual velocity offset compared with 
the Lick dewar 24 in the same way as we had done for the 
other Lick dewars in Case B. 

Figure [TS] shows a plot of eccentricity versus period for 
our 3 planet HMCMC fit to the combined data set. The 
three starting periods used for the HMCMC run were 10, 
1078, & 6000 days. The residual velocity offset parameters 
for the HET and HJS telescopes were 1.5t{'° and -0.2 ± 1 
m s" 1 , respectively. It is clear from the figure that the same 
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Figure 18. A plot of eccentricity versus period for a 3 planet 
HMCMC fit of the combined Lick, HET, and HJS telescope data 
set. 



three periods appear as before but with the extra data the 
results now favor low eccentricity orbits for all three peri- 
ods. This is a particularly pleasing result as low eccentricity 
orbits are more likely to exhibit long term stability than 
high eccentricity orbits. The preference for low eccentrcity 
orbits is more apparent in the marginal distributions shown 
in Figure [19] 

Our final orbital parameters are summarized in Table [S] 
together with the residual offset velocities and the extra 
noise term s. Again, the parameter value listed is the me- 
dian of the marginal probability distribution for the param- 
eter in question and the error bars identify the boundaries 
of the 68.3% credible region. The value immediately below 
in parenthesis is the MAP value, the value at the maximum 
of the joint posterior probability distribution. 

The final period phase plots are shown in Figure 1201 
The top left panel shows the data and model fit versus 1078 
day orbital phase after removing the effects of the two other 
orbital periods. The red and green curves are the mean HM- 
CMC model fit +1 standard deviation and mean model fit 
— 1 standard deviation, respectively. The dashed curve is 
the mean HMCMC fit. The other two panels correspond 
to phase plot for the other two periods. In each panel the 
quoted period is the mode of the marginal distribution. The 
P2 and P3 phase coverage for the combined HET and HJS 
data (not shown) is not sufficient to warrant a fully inde- 
pendent search for these two periods. 

HMCMC fits of a 4 planet model to the combined Lick, 
HET and HJS data set failed to detect a well defined fourth 
period casting doubt on the validity of the 370.8^2 day pe- 
riod detected in the Lick only data. Even though this period 
was well defined in the Lick only data, the Bayesian false 
alarm probability of ~ 0.5 is much too high to warrant any 
claim of significance. The period is also suspiciously close to 
one year and might be an artefact of the data reduction. 



6.1 Eccentricity bias 



In Section [3TT] we showed that HMCMC periodogram peaks 
exhibit a well defined statistical bias towards high eccen- 
tricity in the absence of a real periodic signal. To mimic a 
circular velocity orbit the noise points need to be correlated 
over a larger fraction of the orbit than they do to mimic a 
highly eccentric orbit. For this reason it is more likely that 
noise will give rise to spurious highly eccentric orbits than 
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Figure 19. A plot of parameter marginal distributions for a 3 planet HMCMC of the combined Lick, HET, and HJS telescope data set. 
The residual offset velocity parameters are relative to the Lick dewar 24. They are designated Vj, where j = 6, 8, 39, 18 correspond to the 
other Lick dewars and subscripts HET and HJS refer to the Hobby-Eberly Telescope and Harlam J. Smith telescopes llWittenmver et al.l 
I2009F ). 



low eccentricity orbits. Is there a similar or stronger bias 
when there is a real periodic signal? Based on the above 
explanation of the bias we would expect noise to conspire 
to increase the eccentricity of detected periodogram peaks 
associated with the real periodic signals. Our expectation is 
that the importance of this bias will be dependent on the 
strength of the signal and possibly on the number of ob- 
served periods Q. For very strong signals like the 1078 day 
period we would expect the bias to be very small. For very 
weak signals the bias might well be approximated by the 
no real periodic signal eccentricity bias which we quanti- 



5 This will be the subject of a future investigation. 



fied earlier. As we have seen, in the case of the 47 UMa 
~ 2300 day period, the Lick data alone favors an eccentric- 
ity of ~ 0.3, even when we include the eccentricity bias filter. 
When we added more data the eccentricity was noticeably 
reduced. What if we simulated a Lick only data set for a 3 
planet model based on the MAP 3 planet model parameters 
for the combined Lick, HET, and HJS analysis. Would the 
HMCMC analysis of the simulated data favor higher eccen- 
tricities, possibly indicating that there is some additional 
eccentricity bias operating. To test for this we carried out 
this simulation but modified the MAP parameter values so 
all three eccentricities were identically zero and P3 = 10000 
days. Also, no residual offsets were included for this test so 
the analysis corresponds to Case A. 
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Table 6. Final 3 planet model parameter estimates from the HM- 
CMC fit of the combined Lick, HET, and HJS telescope data set. 



Parameter 


planet 1 


planet 2 


planet 3 


P (d) 


10781?, 
(1078) 


2391±£°° 
(2430) 


14002+ 4018 
(47831) 
moQe — iizoi 


K (m s- 1 ) 


48 4+ - 8 
(48.2) 


8.0ti;° 

(8.3) 


13.8l|j 
(13.5) 


e 


0.032+;°i* 
(0.038) 


n nQS+- 047 
0.098_ og6 

(0.020) 


0-16l;?e 
(0.67) 


w (deg) 


334±g 
(324) 


295+ 114 
(356) 


110 +132 

(110) 


a (au) 


2.100l : ° 2 
(2.10) 


3.6+-J 
(3.6)' 


11.61- 
(26.3) 


M sini (Mj) 


2 5S+' 07 
(2.53) 


0.540!;^ 
(0.567) 


1 64+- 29 
(1.86) 


Periastron 
passage 

(JD - 2,440,000) 


(11888) 


124411™ 
(12778) 


uraelS™? 

(11736) 


V 6 (m s" 1 ) 


1 1+ 2 ' 8 

± - 1 -2.9 
(4.0) 


V 8 (m s- 1 ) 


-0 6+ 2 6 

U.D_ 2 g 

(1.0) 


V39 (m s" 1 ) 


(-0.5) 


Vis (m s- 1 ) 


-5 1+ 1 ' 7 
S - i -1.6 

(-4.6) 


Vhet (m s" 1 ) 


1 5 +1 '° 
(1.3) 


Vhjs (m s _1 ) 


-0 2+ 1 ' 
(0.1) 


s (m s _1 ) 


r 7 +0.3 
°- ' -0.3 

(5.3) 








Pi Orbital phase 



P 2 Orbital phase 




P 3 Orbital phase 

Figure 20. The top left panel shows the data and model fit versus 
1078 day orbital phase after removing the effects of the two other 
orbital periods. The red and green curves are the mean HMCMC 
model fit +1 standard deviation and mean model fit —1 standard 
deviation, respectively. The dashed curve is the mean HMCMC 
fit. The other two panels correspond to phase plot for the other 
two periods. 
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Figure 21. A plot of eccentricity versus period for the 3 planet 
HMCMC fit of the 3 planet simulation. 



Figure [2T] shows a plot of eccentricity versus period for 
the simulation. The starting period values for the HMCMC 
were 5, 20, and 100 days, a long way from the expected val- 
ues. Again, all three simulated periods were detected and 
the preferred eccentricities are all close to zero but with sig- 
nificant tails extending to higher eccentricity. Based on this 
test there does not appear to be any clear additional eccen- 
tricity bias operating. The fact that the real Lick data alone 
favor (in Case A and Case B) somewhat larger eccentricities 
for P2 and P3 suggests there may be something else present 
in the real data, possible some low level systematic effect 
or other real signals. In this regard, the eccentricity of the 
longer period was considerably higher in the 2 planet models 
than when allowance was made for an additional period in 
the 3 planet models. 



7 CONCLUSIONS 

In this paper, we have demonstrated that a Bayesian adap- 
tive hybrid MCMC (HMCMC) analysis of a challenging 
data set has helped clarify the number of planets present 
in 47UMa. HMCMC integrates the advantages of parallel 
tempering, simulated annealing and the genetic algorithm. 
Each of these techniques was designed to facilitate the de- 
tection of a global minimum in x 2 • Combining all three in an 
adaptive hybrid MCMC greatly increase the probability of 
realizing this goal. The adaptive Bayesian hybrid MCMC is 
very general and can be applied to many different nonlinear 
modeling problems. It has been implemented in gridMath- 
ematica on an 8 core PC. The increase in a speed for the 
parallel implementation is a factor 6.6. When applied to the 
Kepler problem it corresponds to a multi-planet Kepler pe- 
riodogram which is ideally suited for detecting signals that 
are consistent with Kepler's laws. However, it is more than 
a periodogram because it also provides full marginal poste- 
rior distributions for all the orbital parameters that can be 
extracted from radial velocity data. The execution time for 
a 1 planet blind fit (7 parameters) is 10 6 iterations per hr. 
The program scales linearly with the number of parameters 
being fit. 

The 47UMa data has been analyzed using 1, 2, 3, 4, and 
5 planet models. On the basis of the model selection results 
we can conclude there is strong evidence for three planets 
based on a Bayesian false alarm probability of 5.0 x 10 -6 , 
however, the longest period orbital parameters are still not 
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well defined. The measured periods (based on the combined 
data set) are 1078 ± 2, 239llg?°, and 14002±|g||d ! and the 
corresponding eccentricities are 0.032 ± 0.014, 0.098^Qgg, 
and 0.161' °g. The results favor low eccentricity orbits for all 
three. Note: the longer time base of the full Lick data set 
favors a value for P3 at the lower end of the 68% credible 
region of ~ 10, 000 days. Assuming the three signals (each 
one consistent with a Keplerian orbit) are caused by planets, 
the corresponding limits on planetary mass (Msini) and 
semi-major axis are (2.53^QgMj, 2.10 ± 0.02au), 
(0.54±0.07A'/j, 3.6±0.1au), and (1.6t°- 3 5 Mj, lLel^au), 
respectively. Based on our three planet model results, the 
remaining unaccounted for noise (stellar jitter) is 5.7m s _1 . 

A four planet model fit to the Lick data yielded a well 
defined fourth period of 37O.8I2 Q days and eccentricity of 
0-571q 15, but the combined data set did not yield a well de- 
fined fourth period. Even though this period was well defined 
in the Lick only data, the Bayesian false alarm probability of 
« 0.5 is much too high to warrant any claim of significance. 
The period is also suspiciously close to one year and might 
be an artefact of the data reduction. 

The velocities of model fit residuals were randomized 
in multiple trials and processed using a one planet version 
of the HMCMC Kepler periodogram. In this situation peri- 
odogram probability peaks are purely the result of the effec- 
tive noise. The orbits corresponding to these noise induced 
periodogram peaks exhibit a well defined statistical bias to- 
wards high eccentricity. We have characterized this eccen- 
tricity bias and designed a correction filter that can be used 
as an alternate prior for eccentricity to enhance the detec- 
tion of planetary orbits of low or moderate eccentricity. On 
the basis of our understanding of the mechanism underlying 
the eccentricity bias, we expect the eccentricity prior filter to 
be generally applicable to searches for low amplitude orbital 
signals in other precision radial velocity data sets. 
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